Visualizing Digital Collections of Web Archives

نویسندگان

  • Michele C. Weigle
  • Michael L. Nelson
چکیده

An important problem in web archiving is understanding and presenting how a single page changes over time. This is not only important for researchers, but can also be useful in educating the general public about the temporal and dynamic nature of the web. A common method for presenting webpage change is to display a set of thumbnails of the mementos, or archived pages. Although this can be a useful way to display mementos, a problem arises when there are too many thumbnails to display. For example, the Internet Archive has over 17,000+ mementos for cnn.com over a 14 year span. So even if the Internet Archive has thumbnails for all the mementos, some form of sampling will be necessary because the cognitive load of processing all the mementos will be beyond what the user can handle. Our goal in this work was to develop tools that implement thumbnail summarization for TimeMaps. To do this we have developed the following tools: • a web service that allows anyone to view a TimeMap using thumbnail summarization, • a wayback add-on that can allow any archive using wayback to provide this service for their users, • an embeddable version to allow web page authors to embed an overview of past versions of their page into the live web version of the page itself; this will help to further the integration of the live web with the past web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining Web Archives

Many institutions are now building rich, significant archives of web content. Though the number of web archiving programs has grown, access models for these collections have remained focused on URL-based discovery and traditional live-web-style browsing. Given the resources required to build and maintain web archives, finding new forms of access for these collection will help increase use and t...

متن کامل

Towards an Ontology for Describing Archival Resources

Several digital libraries and archives are emerging around the world due to the need to store, organize and make available on the Web a lot of resource collections. However, managing this information poses new challenges in order to overcome traditional data management and information browsing. Semantic Web technologies can improve digital libraries and archives by facilitating metadata storage...

متن کامل

Large-Scale Collections Under The Magnifying Glass: Format Identification For Web Archives

Institutions that perform web crawls in order to gather heritage collections have millions – or even billions – of files encoded in thousands of different formats about which they barely know anything. Many of these heritage institutions are members of the International Internet Preservation Consortium, whose Preservation Working Group decided to address the issues related to format identificat...

متن کامل

Discovering and Visualizing Self-Organizing Communities within Email Archives

With the advent of the Internet, use of electronic mail (email) for both personal comm unication and professional collaboration has continually increased. The use of email for communication will soon far exceed the use of the conventional mail, if it does not already. Email naturally lends itself to being archived because it is stored as a digital medium and as a result, many people have amasse...

متن کامل

Design and Selection Criteria for a National Web Archive

Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a n...

متن کامل

Collex: Facets, Folksonomy, and Fashioning the Remixable web

C ollex is an online toolset designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated COLLections and EXhibits mechanism for the semantic web. It allows users to search, browse, annotate, and tag electronic objects and to repurpose them in illustrated, interlinked essays or exhibits. By saving information about user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015